Search CORE

4 research outputs found

A MULTIPROTOCOL AUTOMATIC

Author: A. Bouteiller
A. Bouteiller
G. Krawezik
G. Krawezik
P. Lemarinier
P. Lemarinier
T. Herault
T. Herault
Publication venue
Publication date
Field of study

High performance computing platforms such as Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message passing libraries in HPC applications. These two trends raise the need for fault-tolerant MPI. The MPICH-V project focuses on designing, implementing and comparing several automatic fault-tolerant protocols for MPI applications. We present an extensive related work section highlighting the originality of our approach and the proposed protocols. We then present four fault-tolerant protocols implemented in a new generic framework for fault-tolerant protocol comparison, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal messag

CiteSeerX

A MULTIPROTOCOL AUTOMATIC

Author: A. Bouteiller
A. Bouteiller
G. Krawezik
G. Krawezik
P. Lemarinier
P. Lemarinier
T. Herault
T. Herault
Publication venue
Publication date: 01/01/2006
Field of study

HAL-CentraleSupelec

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Comparison of the Scalability of OpenMP Implementations

Author: A Muddukrishna
A Rodchenko
B Kuhn
C Iwainsky
C Terboven
DL Mills
E Gabriel
G Krawezik
J Clet-Ortega
JM Bull
JM Diaz
R Brightwell
R Gupta
R Nanjegowda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2020
Field of study

OpenMP implementations must exploit current and upcoming hardware for performance. Overhead must be controlled and kept to a minimum to avoid low performance at scale. Previous work has shown that overheads do not scale favourably in commonly used OpenMP implementations. Focusing on synchronization overhead, this work analyses the overhead of core OpenMP runtime library components for GNU and LLVM compilers, reflecting on the implementation's source code and algorithms. In addition, this work investigates the implementation's capability to handle current CPU-internal NUMA structure observed in recent Intel CPUs. Using a custom benchmark designed to expose synchronization overhead of OpenMP regardless of user code, substantial differences between both implementations are observed. In summary, the LLVM implementation can be considered more scalable than the GNU implementation, but the GNU implementation yields lower overhead for lower threadcounts in some occasions. Neither implementation reacts to the system architecture, although the effects of the internal NUMA structure on the overhead can be observed

TUbiblio

Crossref

MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI

Author: A. Bouteiller
Bailey D.
Batchu R.
Bhatia K.
Bosilca G.
Bouteiller A.
Burns G.
F. Cappello
G. Krawezik
Johnson D. B.
Lee B.
Litzkow M.
Louca S.
P. Lemarinier
Sankaran S.
Snell Q.
Snir M.
T. Herault
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref